


Lesson: Determining the Need for Record Deduplication 


Once matches have been identified, data from these match groups can be salvaged and 
posted to form a single best record or posted to all matching records to update them. 


Basic Concepts — Match and Consolidate 


e Matching identifies duplicate records within multiple sources, such as tables, files, and so 


on. 


Matching can compare records in multiple directions, and then join the intersections. 


Matching can use various fields for comparisons: name, address, phone number, 
account numbers, operational data and user defined fields like product ID, description, 


and soon. 


e Consolidation either eliminates, filters or combines duplicate records utilizing configurable 


rules 


Consolidation can build a best record — allowing best field selection based on your 
priorities: source, frequency, completeness, recency, and so on. 


Consolidation builds reference keys to track individual records and their associations 


across multiple databases. 


Table 29: Input Records and Consolidated Record Example 


Input Records 


Ms Margaret Smith-Kline Ph.D. 
Future Electronics 

101 Avenue of the Americas 

New York NY 10013-1933 
maggie.kline@future_electronics.com 
May 23, 2003 


Maggie Smith 

Future Electronics Co. LLC 
101 6th Ave. 

Manhattan, NY 10012 


maggie.kline@future_electronics.com 
001-12-4367 
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Consolidated Record 


Name: 

Ms. Margaret Smith-Kline Ph.D. 
Company name: 

Future Electronics Co. LLC 
SSN: 

001-12-4367 

Purchase date: 

5/23/2003 

Address: 

101 Avenue of the Americas 
New York, NY 10013-1933 
Latitude: 

40.722970 

Longitude: 

-74.005035 

Fed code: 

36061 

Phone: 

(222) 922-9922 


Email: 
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